Implementing FIFO (multiple) Transactions

A fully working external FIFO example using GPIF Single transactions has already been discussed, but the bandwidth achieved is miniscule. This is because there is a lot of firmware overhead involved in launching GPIF Single transactions. With GPIF FIFO transactions, the GPIF engine directly handles bursts of data, so a higher bandwidth over the physical interface is achievable.

Introducing the Flow State Feature of the GPIF

In order to efficiently handle bursts of data and meet burst access timing to the external FIFO, the flow state feature of the GPIF was utilized for the FIFO transaction example. The flow state feature makes its debut in the FX2 GPIF and is a mechanism that allows the GPIF to efficiently throttle data on and off the bus by using an independent set of RDYn logic (flow logic) that is separate from the decision point RDYn logic. Since the flow state feature is an advanced mode of the GPIF, not every application will need to use the flow state. However, handling bursts of data to and from an external FIFO shows the simplest application of the flow state. One very advanced application of the flow state is in the generation of UDMA waveforms for the FX2 mass storage reference design firmware.

In any GPIF waveform, there can only be one flow state, but it can be any of the available non-idle states (S0–S6). The flow state behavior is controlled by a set of registers that are specific to the flow state feature (see the FX2 Technical Reference Manual for flow state register details). One can think of the flow state as being “orthogonal” to one of the GPIF waveform’s states, but it is still the regular decision point logic that is responsible for determining when the flow state should be exited and the normal GPIF waveform behavior continues.

Another property of the flow state is that it can be programmed to perform a different set of CTLx logic than what is described in the GPIF waveform descriptors themselves. This brings the level of autonomy to another notch. The idea behind the GPIF FIFO Read and Write descriptor programming is to have the read and write control lines assert for the duration of the transaction, thereby allowing data to be moved on every edge of IFCLK. Therefore, a 16-bit interface running at 48 MHz would yield an effective burst data rate of 96 MB/s over the GPIF interface.

The main difference between this FIFO transaction version and the single transaction version is that waveforms 2 and 3 are used (FIFORd and FIFOWr waveforms, respectively) instead of waveforms 0 and 1. RDY5 is used as the GPIF transaction count (GPIF TC) internal expiration flag (TCXpire). The GPIF TC is what is used in the waveform’s decision point logic to determine when to exit out of the flow state and terminate the waveform.


Figure 16.  Block Diagram for FIFO Transactions
 

Figure 16 shows the set-up of the block diagram and the naming conventions of the CTLx and RDYn signals (same as the single transaction example). Figure 17 below shows waveform 3, which characterizes the behavior of the FIFO Write waveform.


Figure 17. FIFO Write waveform in GPIF Designer

In this FIFO Write waveform (waveform 3) we see that S0 is a period of inactivity, followed by S1 which is designated as the flow state. The decision point logic in S1 looks at the GPIF TC to determine when to terminate the waveform by branching to the IDLE state. As previously mentioned, the flow logic in S1 then takes over to throttle data on and off the bus and manipulate the CTLx lines. The flow state registers are set up by selecting the various flow state parameters, accessed by right clicking on the S1 state trace.

In order to set up the flow state for both FIFO reads and writes, a set of global GPIF and flow state registers are first initialized. The values are taken from a FlowStates[36] array in gpif.c, generated by GPIF Designer.

    (Note: The FlowStates array, in GPIF.c, could be re-declared as FlowStates[4][9], for simplicity.  The first 9 elements contain the FlowState register values for waveform 0.  The next 9 elements contain the FlowState register values for waveform 1, etc.  Therefore, FlowStates[19] is the same element as FlowStates[2][1].)

    EP2GPIFFLGSEL = 0x01; // For EP2OUT, GPIF uses EF flag
    SYNCDELAY;
    EP6GPIFFLGSEL = 0x02;
    // For EP6IN, GPIF uses FF flag
    SYNCDELAY;

    // global flowstate register initializations
    FLOWLOGIC = FlowStates[19];
    // 0011 0110b - LFUNC[1:0] = 00 (A AND B), TERMA/B[2:0]=110 (FIFO Flag)
    SYNCDELAY;
    FLOWSTB = FlowStates[22];
    // 0000 0100b - MSTB[2:0] = 100 (CTL4), not used as strobe
    SYNCDELAY;
    GPIFHOLDAMOUNT = FlowStates[26];
    // hold data for one half clock (10ns) assuming 48MHz IFCLK
    SYNCDELAY;
    FLOWSTBEDGE = FlowStates[24];
    // move data on both edges of clock
    SYNCDELAY;
    FLOWSTBHPERIOD = FlowStates[25];
    // 20.83ns half period
    SYNCDELAY;

The set-up is such that when FIFO Write transactions are launched from EP2OUT, the GPIF uses EP2’s empty flag (EF) as the FIFO Flag, and when FIFO Read transactions are launched into EP6IN, the GPIF uses EP6’s full flag (FF) as the FIFO Flag.

Subsequently, the flow logic is set up to use the FIFO Flag to throttle data on and off the bus, so the flow state mechanism actually uses EP2EF and EP6FF status to know when to keep writing to the data bus or keep reading from the data bus, respectively.

Although CTL4 (unused) is not used in the application, we take advantage of the fact that the flow state can use any of the CTLx lines as a data strobe. At a 48-MHz IFCLK, CTL4 is toggled at a half period of 20.83 ns. Since the flow state is also programmed to move data on both edges of the data strobe, this allows us to nicely align the data values with the rising edge of IFCLK and achieve a 96-MB/s burst rate over the physical interface. Note that although CTL4 is not physically exposed on the 56-pin package, the flow state logic can still be set up to use it as a data strobe.

Let’s also examine the flow state register set-up that is specific to FIFO Writes:

    void Setup_FLOWSTATE_Write ( void )
    {
    FLOWSTATE = FlowStates[18];
    // 1000 0001b - FSE=1, FS[2:0]=001
    SYNCDELAY;
    FLOWEQ0CTL = FlowStates[20];
    // CTL0 = 0 when flow condition equals zero (data flows)
    SYNCDELAY;
    FLOWEQ1CTL = FlowStates[21];
    // CTL0 = 1 when flow condition equals one (data does not flow)
    SYNCDELAY;
    }

Here we designate S1 to be the flow state and define the state of CTL0 when the flow condition equals zero (data flows) and when the flow condition equals one (data does not flow). Remember that the state of the flow condition is determined by the state of EP2EF. So when the EP2 FIFO contains data (EP2 is not empty) the flow condition equals zero, the flow state drops CTL0 LOW (WEN# is asserted), and data is placed on FD[15:0].

Figure 18 below shows waveform 2, which characterizes the behavior of the FIFO Read waveform.


Figure 18. FIFO Read waveform in GPIF Designer

In this FIFO Read waveform (waveform 2) S0 is a period of inactivity, then S1 and S2 sets up the “front porch” of the burst transfer, followed by S3 which is designated as the flow state. The decision point logic in S3 looks at the GPIF TC to determine when to terminate the waveform by branching to the IDLE state. As previously mentioned, the flow logic in S3 then takes over to throttle data reads from the bus and manipulate the CTLx lines.

Let’s examine the flow state register set-up that is specific to FIFO Reads:

    void Setup_FLOWSTATE_Read ( void )
    {
    FLOWSTATE = FlowStates[27];
    // 1000 0011b - FSE=1, FS[2:0]=003
    SYNCDELAY;
    FLOWEQ0CTL = FlowStates[29];
    // CTL1/CTL2 = 0 when flow condition equals zero (data flows)
    SYNCDELAY;
    FLOWEQ1CTL = FlowStates[30];
    // CTL1/CTL2 = 1 when flow condition equals one (data does not flow)
    SYNCDELAY;
    }

Here we designate S3 to be the flow state and define the state of CTL1 and CTL2 when the flow condition equals zero (data flows) and when the flow condition equals one (data does not flow). Remember that the state of the flow condition is determined by the state of EP6FF. So when the EP6 FIFO has room for data (EP6 is not full) the flow condition equals zero, the flow state drops CTL1 and CTL2 LOW (REN and OE are asserted), and data is read from FD[15:0].

Since there is a different flow state register set-up for FIFO read and write operations, the firmware has to call Setup_FLOWSTATE_Read() before launching a GPIF FIFO read transaction, and call Setup_FLOWSTATE_Write() before launching a GPIF FIFO write transaction.

Now that you understand how the GPIF FIFO read and write waveforms were programmed and set up, the firmware programming for GPIF FIFO transactions can be discussed. 

 

FIFO Transaction Firmware
 

In moving from GPIF Single transactions to GPIF FIFO transactions, the only major difference really lies in the TD_Poll() code. The basic underlying architecture of the example remains the same. In this section, the basic principles of launching a FIFO transaction are introduced. Following that is a discussion of the TD_Poll() code that triggers the GPIF FIFO transactions.
 

    Triggering GPIF FIFO Transactions

    For triggering GPIF FIFO transactions, we reiterate the concept of the GPIF transaction count (TC). Analogous to the Tcount variable in the single transaction example, the TC is a value the GPIF engine uses to determine how many times to go through a FIFO waveform.

    For example, if the user wished to burst out 512 bytes of data from the EP2OUT endpoint, the TC value would be set to 512 (for byte wide operation) or 256 (for word wide operation). The GPIF engine then decrements the TC value on every push or pop of the FIFO. When the TC value reaches zero, the waveform is complete (a waveform completion is signified by the GPIFDONE being set in the GPIFIDLECS register). A decision point state can use the TC value as an internal flag to determine whether or not to branch to the IDLE state. GPIFREADYCFG.5 must be set to allow the GPIF engine to use the RDY5 signal as an internal TC expiration flag.

    The act of triggering a GPIF FIFO transaction is actually very simple. Writing to the R/W bit in the GPIFTRIG register sets the direction of the transaction. If R/W=1, a FIFO Read transaction gets triggered when accessing the GPIFTRIG register. If R/W=0, a FIFO Write transaction get triggered instead.

    For example, to trigger a GPIF FIFO Read transaction to EP6IN use the following line of code:

    GPIFTRIG = GPIFTRIGRD | GPIF_EP6; // launch GPIF FIFO Read transaction to EP6IN

    To trigger a GPIF FIFO Write transaction from EP2OUT use the following line of code:

    GPIFTRIG = GPIF_EP2; // launch GPIF FIFO Write transaction from EP2OUT

    GPIFTRIGRD, GPIF_EP6, and GPIF_EP2 are bit masks to set the appropriate bits in the GPIFTRIG register. By setting the EP[1:0] bits in the GPIFTRIG register to valid options of 0,1,2, or 3 (in order of the endpoints 2,4,6, and 8), this specifies which endpoint should be used in the transaction. Source or sink direction is implied by whether the endpoint is an IN or and OUT endpoint. 

    TD_Init( ) 

    The initialization code in TD_Init( ) remains pretty much the same as the single transaction version. The main differences lie in the setup of the FIFOCFG registers. To maximize the USB 2.0 bandwidth, the endpoints are placed into auto mode (AUTOOUT/AUTOIN=1). Note that the bits 1 and 0 of the REVCTL register are not set. Therefore, it is necessary to first set AUTOOUT=0, then set AUTOOUT=1. The FX2 needs to see a 0 to 1 transition of the AUTOOUT bit to automatically arm the endpoint buffers. 

    // set the CPU clock to 48MHz
    CPUCS = ((CPUCS & ~bmCLKSPD) | bmCLKSPD1);
    SYNCDELAY;

    EP2CFG = 0xA0; // EP2OUT, bulk, size 512, 4x buffered
    SYNCDELAY;
    EP6CFG = 0xE0; // EP6IN, bulk, size 512, 4x buffered
    SYNCDELAY;

    FIFORESET = 0x80; // set NAKALL bit to NAK all transfers from host
    SYNCDELAY;
    FIFORESET = 0x02; // reset EP2 FIFO
    SYNCDELAY;
    FIFORESET = 0x06; // reset EP6 FIFO
    SYNCDELAY;
    FIFORESET = 0x00; // clear NAKALL bit to resume normal operation
    SYNCDELAY;

    EP2FIFOCFG = 0x01;
    SYNCDELAY;
    EP2FIFOCFG = 0x11; // auto out mode, disable PKTEND zero length send, word ops
    SYNCDELAY;
    EP6FIFOCFG = 0x09; // auto in mode, disable PKTEND zero length send, word ops
    SYNCDELAY;

    GpifInit (); // initialize GPIF registers

    // reset the external FIFO
    OEA |= 0x04; // turn on PA2 as output pin
    IOA |= 0x04; // pull PA2 high initially
    IOA &= 0xFB // bring PA2 low
    EZUSB_Delay (1); // keep PA2 low for ~1ms, more than enough time
    IOA |= 0x04; // bring PA2 high

    TD_Poll() 

    The first thing the OUT handling code does is it checks to see if the GPIF is IDLE. If so, it checks to see if there is at least a packet in the peripheral domain for EP2. Since EP2 is placed into auto mode, the firmware does not need to check if the host sent a USB packet. The USB packets are automatically committed to be used by the GPIF engine. Therefore, the firmware's job is to check if at least one packet has been committed to the peripheral domain.

    Then, if the external FIFO is not full, the TC value is setup for word wide operation (256). The TC value is a 32-bit register field, but for this application only the lower 16-bit fields are necessary. Since each GPIF FIFO Write transaction sends 512 bytes to the external FIFO over a 16-bit interface, the number of transactions is always half the number of bytes actually contained within the endpoint buffer. The appropriate TC value is setup for either high speed or full speed operation.

    The appropriate flow state registers are then setup for the FIFO Write transaction, and a write to the GPIFTRIG register with the appropriate bits triggers the transaction from EP2OUT. The code then waits for the transaction to complete before exiting out of the "if" nest.

    // code that handles USB OUT transfers

    if( GPIFTRIG & 0x80 ) // if GPIF interface IDLE
    {
        if ( ! ( EP24FIFOFLGS & 0x02 ) ) // if there's a packet in the peripheral domain for EP2
        {
            if ( EXTFIFONOTFULL ) // if the external FIFO is not full
            {
                if(enum_high_speed)
                {
                    SYNCDELAY;
                    GPIFTCB1 = 0x01; // setup transaction count (512 bytes/2 for word wide -> 0x0100)
                    SYNCDELAY;
                    GPIFTCB0 = 0x00;
                    SYNCDELAY;
                }
                else
                {
                    SYNCDELAY;
                    GPIFTCB1 = 0x00; // setup transaction count (64 bytes/2 for word wide -> 0x20)
                    SYNCDELAY;
                    GPIFTCB0 = 0x20;
                    SYNCDELAY;
                }

                Setup_FLOWSTATE_Write(); // setup FLOWSTATE registers for FIFO Write operation
                SYNCDELAY;
                GPIFTRIG = GPIF_EP2; // launch GPIF FIFO WRITE Transaction from EP2 FIFO
                SYNCDELAY;

                while( !( GPIFTRIG & 0x80 ) ) // poll GPIFTRIG.7 GPIF Done bit
                {
                   ;
                }
                SYNCDELAY;
            }
        }
    }

    Just like the single transaction firmware, if the in_enable flag is not set, the code will just sit there and not process the INs.

    If the in_enable flag is set, the code will fall through and check if the GPIF interface is IDLE. It then goes on to check if the external FIFO is not empty. If the external FIFO has data, the code then determines if EP6 has room for at least one more data packet.

    If EP6 has room for at least one more data packet, the TC value is setup for word wide operation (256). The appropriate TC value is setup for either high speed or full speed operation. The flow state registers are then setup for the FIFO Read transaction, and a write to the GPIFTRIG register with the appropriate bits triggers the transaction to fill the EP6 FIFO. The code then waits for the transaction to complete. Since EP6 is placed into auto mode, there is no need to explicitly write a byte count value to indicate how many bytes to send to the host. FX2 uses the EP6AUTOINLENH/L register values set at enumeration time in the DR_SetConfiguration() function for the auto commit size.

    // code that handles USB IN transfers

    if (in_enable) // if IN transfers are enabled
    {
        if ( GPIFTRIG & 0x80 ) // if GPIF interface IDLE
        {
            if ( EXTFIFONOTEMPTY ) // if external FIFO is not empty
            {
                if ( !( EP68FIFOFLGS & 0x01 ) ) // if EP6 FIFO is not full
                {
                    if (enum_high_speed)
                    {
                        SYNCDELAY;
                        GPIFTCB1 = 0x01; // setup transaction count (512 bytes/2 for word wide -> 0x0100)
                        SYNCDELAY;
                        GPIFTCB0 = 0x00;
                        SYNCDELAY;
                    }
                    else
                    {
                        SYNCDELAY;
                        GPIFTCB1 = 0x00; // setup transaction count (64 bytes/2 for word wide -> 0x20)
                        SYNCDELAY;
                        GPIFTCB0 = 0x20;
                        SYNCDELAY;
                    }

                    Setup_FLOWSTATE_Read(); // setup FLOWSTATE registers for FIFO Read operation
                    SYNCDELAY;
                    GPIFTRIG = GPIFTRIGRD | GPIF_EP6; // launch GPIF FIFO READ Transaction to EP6 FIFO
                    SYNCDELAY;

                    while ( !( GPIFTRIG & 0x80 ) ) // poll GPIFTRIG.7 GPIF Done bit
                    {
                        ;
                    }

                    SYNCDELAY;
                }
            }
        }
    }

     

Running the example for GPIF FIFO Transactions

The procedure for running the FIFO transaction example is essentially the same as the Single transaction example. Going through steps 1 through 3 of section 4.1.6 will allow the user to run the FIFO transaction example as well. For running this version of the example, unzip the "FX2_to_extsyncFIFO GPIF FIFO Transactions Auto mode.zip" package instead.

A couple of differences to note are that LED0 will no longer flash when the code is downloaded, and that a few more vendor commands were added for debug purposes. The LED0 code was taken out of TD_Poll() to optimize the firmware execution for FIFO transactions.

Debug Tip:
The use of vendor commands is a "cheap" way to add more debug functionality to the code without incurring unnecessary "printf" statements. With the use of vendor commands, the Keil debugger is not necessary for peeking and poking register values after the fact, which is what most GPIF firmware developers will end up doing. For example, the vendor command 0xb6 was added to the FIFO transaction firmware to read back the status of the GPIF engine. The vendor command returns the 0xb6 request with the value of the GPIFTRIG register. If the GPIF engine has completed a FIFO read or write transaction, the GPIFDONE bit is set, returning a value of 0x80. The screenshot below shows what the user should see in the EZ-USB Control Panel window.


     

Logic Analyzer Traces

These are the traces the user should see on the logic analyzer as the FIFO transaction example runs.  The traces were captured using an HP1660C logic analyzer.

 

FIFO Write: Close-up view of the front porch

This trace shows that the 4 ns data setup time for the external FIFO is satisfied using the X to 0 marker as an indicator. The word consisting of data values 0x02 and 0x03 is written into the external FIFO on the rising edge of IFCLK (the external FIFO's WCLK). While WEN/ is held low, consecutive words are written into the external FIFO on every rising edge of IFCLK. Notice that the GSTATE bus reflects the state of the GPIF engine as it's progressing through the GPIF FIFO Write waveform. S0 is a period of inactivity for 1 IFCLK cycle (20.83 ns) and S1 is the flow state and is active for the entire duration of the data burst phase.

 

FIFO Write: Close-up view of the back porch

Here we see the back end of the 512 byte transfer at a zoomed in level. The last word in the packet consists of data values 0xFE and 0xFF (the end of our ramp test data). Note that a repeated word at the end is not clocked in as the setup time for the WEN/ line is not met prior to the IFCLK edge.

 

 

FIFO Write: Time taken to transfer 512 bytes to the external FIFO

This trace shows how long it takes to write a burst of 512 bytes (256 words) into the external FIFO. At a burst rate of 96MB/s (one word every IFCLK period), this results in a time of approximately 5.3 microseconds to transfer a payload of 512 bytes. This zoomed out view allows us to see that indeed the GPIF FIFO Write waveform remains in the flowstate until it is done transferring 512 bytes, at which point it then transitions to the IDLE state (S7).

 

 

FIFO Write: Inter-packet transfer time

In this trace we examine the inter-packet transfer time between consecutive OUTs sent by the host. Notice that the FX2 has approximately 20 microseconds to spare before it has to burst out the next OUT packet. This means that the host is behind.

 

 

FIFO Read: Close-up view of the front porch

This trace shows that the 9.2 ns data setup time for the GPIF is satisfied using the X to 0 marker as an indicator. The word consisting of data values 0x00 and 0x01 is read from the external FIFO on the rising edge of IFCLK (the external FIFO's RCLK). While REN/ is held low, consecutive words are read from the external FIFO on every rising edge of IFCLK. Notice that the GSTATE bus reflects the state of the GPIF engine as it's progressing through the GPIF FIFO Read waveform. S0 is a period of inactivity for 1 IFCLK cycle (20.83 ns). In S1, the REN/ is asserted since the external FIFO requires that the REN/ be setup tENS before the OE/ line is asserted. S2 asserts the OE/ line, and S3 is the flow state and is active for the entire duration of the data burst phase.

 

 

FIFO Read: Close-up view of the back porch

Here we see the back end of the 512 byte transfer at a zoomed in level. The last word in the packet consists of data values 0xFE and 0xFF (the end of our ramp test data). Note that a repeated word at the end is not clocked in as the setup time for the REN/ line is not met prior to the IFCLK edge.

 

 

FIFO Read: Time taken to read 512 bytes from the external FIFO

This trace shows how long it takes to read a burst of 512 bytes from the external FIFO. At a burst rate of 96MB/s (one word every IFCLK period), this results in a time of approximately 5.3 microseconds to transfer a payload of 512 bytes. This zoomed out view allows us to see that indeed the GPIF FIFO Read waveform remains in the flowstate until it is done transferring 512 bytes, at which point it then transitions to the IDLE state (S7).

 

 

FIFO Read: Inter-packet transfer time

In this trace we examine the inter-packet transfer time between consecutive INs requested by the host. Notice that the FX2 has approximately 20 microseconds to spare before it has to fulfill the next IN request. This means that the host is behind.

 

 

Bulk Loopback: FIFO Reads and Writes

The user will observe the above wavefrom when the bulkloop utility is exercised. This trace shows activity that includes both reads and writes to the external FIFO. We notice here that the host judiciously schedules INs and OUTs. No favoritism is shown to either type of transfer.

 

 

Summary

This design example of a 16-bit interface to an external synchronous FIFO has brought to the forefront many GPIF programming fundamentals, such as determining GPIF hardware connections, creating GPIF single and FIFO waveform descriptors using the GPIF Tool, and how to launch GPIF single and FIFO transfers in firmware. The user should now have a firm grasp of what it takes to create a full featured GPIF applications solution, and how to go from a simple set of firmware that utilizes GPIF single transactions, to a more complex and robust application that uses GPIF FIFO transfers. Also, by now the user should be aware that the logic analyzer is a GPIF programmer's best friend. Let's extend the basic toolset the user should already have by presenting a more complex design example using a TI DSP.